Improving the performance of minimizers and winnowing schemes
نویسندگان
چکیده
Motivation The minimizers scheme is a method for selecting k -mers from sequences. It is used in many bioinformatics software tools to bin comparable sequences or to sample a sequence in a deterministic fashion at approximately regular intervals, in order to reduce memory consumption and processing time. Although very useful, the minimizers selection procedure has undesirable behaviors (e.g. too many k -mers are selected when processing certain sequences). Some of these problems were already known to the authors of the minimizers technique, and the natural lexicographic ordering of k -mers used by minimizers was recognized as their origin. Many software tools using minimizers employ ad hoc variations of the lexicographic order to alleviate those issues. Results We provide an in-depth analysis of the effect of k -mer ordering on the performance of the minimizers technique. By using small universal hitting sets (a recently defined concept), we show how to significantly improve the performance of minimizers and avoid some of its worse behaviors. Based on these results, we encourage bioinformatics software developers to use an ordering based on a universal hitting set or, if not possible, a randomized ordering, rather than the lexicographic order. This analysis also settles negatively a conjecture (by Schleimer et al. ) on the expected density of minimizers in a random sequence. Availability and Implementation The software used for this analysis is available on GitHub: https://github.com/gmarcais/minimizers.git . Contact [email protected] or [email protected].
منابع مشابه
ارائه یک طرح حفاظت ویژه برای مقابله با پیشامدهای شدید ناحیه فارس در جنوب شبکه سراسری ایران
In the recent years use of Special Protection Schemes due to impact of notable these schemes in improving the performance of power systems has found increasing usage. The purpose of this paper is design a suitable and fast special protection schemes to deal with extreme contingency that will led to collapse of power network if they occurred. So at the first, for a clear form of power tran...
متن کاملA Differentiated Pricing Framework for Improving the Performance of the Elastic Traffics in Data Networks
Rate allocation has become a demanding task in data networks as diversity in users and traffics proliferate. Most commonly used algorithm in end hosts is TCP. This is a loss based scheme therefore it exhibits oscillatory behavior which reduces network performance. Moreover, since the price for all sessions is based on the aggregate throughput, losses that are caused by TCP affect other sessions...
متن کاملChaffing and Winnowing: Confidentiality without Encryption
• Encryption: transforming the message to a ciphertext such that an adversary who overhears the ciphertext can not determine the message sent. The legitimate receiver possesses a secret decryption key that allows him to reverse the encryption transformation and retrieve the message. The sender may have used the same key to encrypt the message (with symmetric encryption schemes) or used a differ...
متن کاملThe effectiveness of family therapy based on bowen's emotional treatment on developmental functions of family and maladaptive emotional schemes among women with major depressive disorder
Depression in women can negatively affect their mental and physical health and offspring. The purpose of the present study was to examine the effectiveness of family therapy based on Bowen's emotional treatment on developmental functions of family and reducing maladaptive emotional schemes among women with major depressive disorder. The research method was quasi experimental with pretest-postte...
متن کاملMonitoring the censored lognormal reliability data in a three-stage process using AFT model
Improving the product reliability is the main concern in both manufacturing and service processes which is obtained by monitoring the reliability-related quality characteristics. Nowadays, products or services are the result of processes with dependent stages referred to as multistage processes. In these processes, the quality characteristic in each stage is affected by the quality characterist...
متن کامل